查看原文
其他

单细胞分析十八般武艺12:metacell

Kinesin 生信会客厅 2022-08-16

      单细胞测序技术的发展日新月异,新的分析工具也层出不穷。每个工具都有它的优势与不足,在没有权威工具和流程的单细胞生信江湖里,多掌握几种分析方法和工具,探索数据时常常会有意想不到的惊喜。


往期专题

单细胞转录组基础分析专题

单细胞转录组高级分析专题


metacell简介

单细胞分析最基础且非常关键的一步是细胞聚类,我们期望把相同类型(cell type)的细胞聚在一起,最好还能让不同状态(cell state)的细胞容易区分。我在项目分析过程中尝试过很多种方法的组合,例如分大群的时候用PCA+KNN+SNN,分亚群的时候会尝试NMF+最大loading值,或者使用主题模型的方法。最近有朋友介绍了一种全新的方法——metacell,不少高分文章用到了此方法,我自己实测效果也挺好。


Github主页及官方教程

https://github.com/tanaylab/metacell

https://tanaylab.github.io/metacell/articles/a-basic_pbmc8k.html


应用案例

  • Li et al., Cell 2018 (scRNA-seq of immune cells from human melanoma tumors)

  • Giladi et al., Nat Cell Biol 2018 (Mouse hematopoiesis scRNA-seq)

  • Sebe-pedros et al., NEE 2018 (whole organisms scRNA-seq)

  • Sebe-pedros et al., Cell 2018 (whole organism scRNA-seq)

  • Bornstein et al., Nature 2018 (thymic stroma scRNA-seq)

  • Cohen et al., Cell 2018 (lung scRNA-seq)


分析原理

metacell有一套自己的数据处理逻辑,它要求输入原始的UMI counts矩阵。数据输入后它会对细胞和基因进行过滤,然后选择用于计算细胞相似性的特征基因(类似于seurat的高变基因),之后基于这些特征基因的表达值构建平衡KNN图。它最核心的算法是将平衡KNN图重复抽样(默认一次抽取75%的细胞)聚类,然后将多次重复中都能聚在一起的细胞确定为metacell,最后会根据lfp值过滤离群值得到最终的metacell。metacell内部包含的cells被认为是同质化的,不同的metacell之间可能是不同cell type,也可能是用一个cell type不同的cell state,总而言之,就是metacell之间是异质的。


安装metacell

if (!require("BiocManager")) install.packages('BiocManager')BiocManager::install("tanaylab/metacell")


metacell实践

metacell在质控方面相比seurat有所欠缺,最后得到的metacell注释起来也非常繁琐。我在metacell实践经验的基础上对原流程做了比较大调整,质控及特征基因的选择采用seurat的流程,生成metacell的核心的步骤原汁原味地保留,最后的metacell注释又回到seurat常用的cluster注释方法。


数据来源

数据来自Immune Landscape of Viral- and Carcinogen-Driven Head and Neck Cancer,数据集GEO编号:GSE139324。建议大家自己下载,也可以加Kinesin微信获取数据的百度云链接。


第一步:创建seurat对象并质控

library(metacell)library(Seurat)library(tidyverse)library(patchwork)rm(list = ls())###### 分析初始化dir.create("metacell")setwd("metacell")# 设置存放数据的目录if(!dir.exists("scdb")) dir.create("scdb")scdb_init("scdb", force_reinit=T) # 设置存放图形的目录if(!dir.exists("figs")) dir.create("figs")scfigs_init("figs")
###### 创建seurat对象fn <- paste0("GSE139324/", c("GSM4138110", "GSM4138111", "GSM4138162", "GSM4138168"))sn <- c('HNC01PBMC', 'HNC01TIL', 'PBMC1', 'Tonsil1')scRNAlist <- list()for(i in 1:length(fn)){ counts <- Read10X(data.dir = fn[i]) #不设置min.cells过滤基因会导致CellCycleScoring报错: #Insufficient data values to produce 24 bins. scRNAlist[[i]] <- CreateSeuratObject(counts, project=sn[i], min.cells=2, min.features = 200) #给细胞barcode加个前缀,防止合并后barcode重名 scRNAlist[[i]] <- RenameCells(scRNAlist[[i]], add.cell.id = sn[i]) #计算线粒体基因比例 if(T){ scRNAlist[[i]][["percent.mt"]] <- PercentageFeatureSet(scRNAlist[[i]], pattern = "^MT-") } #计算核糖体基因比例 if(T){ scRNAlist[[i]][["percent.rb"]] <- PercentageFeatureSet(scRNAlist[[i]], pattern = "^RP[SL]") } #计算红细胞基因比例 if(T){ HB.genes <- c("HBA1","HBA2","HBB","HBD","HBE1","HBG1","HBG2","HBM","HBQ1","HBZ") HB.genes <- CaseMatch(HB.genes, rownames(scRNAlist[[i]])) scRNAlist[[i]][["percent.HB"]]<-PercentageFeatureSet(scRNAlist[[i]], features=HB.genes) }}scRNA <- merge(scRNAlist[[1]], scRNAlist[2:length(scRNAlist)])###### 数据质控minGene=500maxGene=4000maxUMI=15000pctMT=10pctHB=1scRNA <- subset(scRNA, subset = nCount_RNA < maxUMI & nFeature_RNA > minGene & nFeature_RNA < maxGene & percent.mt < pctMT & percent.HB < pctHB)

第二步:seurat转metacell

###### 构建metacell对象## 提取高变基因scRNA <- SCTransform(scRNA)var.genes <- VariableFeatures(scRNA)var.genes <- structure(rep(1:length(var.genes)), names=var.genes)var.genes <- gset_new_gset(sets = var.genes, desc = "seurat variable genes")scdb_add_gset("pbmc", var.genes)
## 提取counts矩阵sce = as.SingleCellExperiment(scRNA)mat = scm_import_sce_to_mat(sce)scdb_add_mat("pbmc", mat)
###### 构建平衡KNN图mcell_add_cgraph_from_mat_bknn(mat_id = "pbmc", gset_id = "pbmc", graph_id = "pbmc_k100", K = 100, dsamp = F) # 20,000 cells之内不必抽样###### 生成metacell#### 共聚类mcell_coclust_from_graph_resamp(coc_id = "pbmc_n1000", graph_id = "pbmc_k100", min_mc_size = 20, p_resamp = 0.75, n_resamp=1000)
#### 生成初级metacellmcell_mc_from_coclust_balanced(coc_id = "pbmc_n1000", mat_id = "pbmc", mc_id = "pbmc", K = 20, min_mc_size = 20, alpha = 2)
#### 修剪metacellmcell_plot_outlier_heatmap(mc_id = "pbmc", mat_id = "pbmc", T_lfc = 3)mcell_mc_split_filt(new_mc_id = "pbmc", mc_id = "pbmc", mat_id = "pbmc", T_lfc = 3, plot_mats = T)
#### metacel配色mc_f <- scdb_mc("pbmc")my.col <- c("darkgray","burlywood1","chocolate4","orange","red","purple","blue","darkgoldenrod3","cyan")mc_f@colors <- colorRampPalette(my.col)(ncol(mc_f@mc_fp))scdb_add_mc("pbmc", mc_f)mc_f <- scdb_mc("pbmc")
#### Markers热图(热图显示的log2(lfp)值)mcell_gset_from_mc_markers(gset_id = "pbmc_markers", mc_id = "pbmc")# 单细胞水平mcell_mc_plot_marks(mc_id = "pbmc", gset_id = "pbmc_markers", mat_id = "pbmc", plot_cells = T)# metacell水平mcell_mc_plot_marks(mc_id = "pbmc", gset_id = "pbmc_markers", mat_id = "pbmc", plot_cells = F)
#### 2D图展示Cells与MCs#download.file("http://www.wisdom.weizmann.ac.il/~arnau/metacell_data/Amphimedon_adult/config.yaml","config.yaml")#tgconfig::override_params("config.yaml","metacell") #使用这个就不用下面的2-3行代码设置参数mcell_mc2d_force_knn(mc2d_id="pbmc", mc_id="pbmc", graph_id="pbmc_k100")tgconfig::set_param("mcell_mc2d_height", 1000, "metacell")tgconfig::set_param("mcell_mc2d_width", 1000, "metacell")mcell_mc2d_plot(mc2d_id = "pbmc")

Marker基因lfp值热图,行是基因,列是metacell编号


metacell的细胞降维图

第三步:metacell结果导入seurat

#### metacell数据转为seurat对象source("~/sc_script/function.R")# 上一行代码导入的是不宜公开的内部函数sco <- mc2sco(mat_id = "pbmc", mc_id = "pbmc", mc2d_id = "pbmc", min.features = 2)sco <- NormalizeData(sco)sco <- SCTransform(sco)sco <- RunPCA(sco, verbose = F)sco <- RunHarmony(sco, group.by.vars="orig.ident", assay.use="SCT")sco <- FindNeighbors(sco, reduction = "harmony", dims = 1:30) %>% FindClusters()sco <- RunUMAP(sco, reduction = "harmony", dims = 1:30)
#### SingleR鉴定细胞类型load("~/database/SingleR_ref/ref_Hematopoietic.RData")sco <- cell_identify2(sco, ref_Hematopoietic)
## 降维图对比p1 <- DimPlot(sco)p2 <- DimPlot(sco, reduction = "mcsc", group.by = "mc_id")p <- p1|p2ggsave("embedding_comparison.png", p, width = 16, height = 6.5)
## 批次效应对比p1 <- DimPlot(sco, group.by = "orig.ident")p2 <- DimPlot(sco, reduction = "mcsc", group.by = "orig.ident")p <- p1|p2ggsave("batches_comparison.png", p, width = 16, height = 6.5)

seurat与metacell降维结果对比


批次效应处理结果对比


## 分群效果对比markers <- c('CD3D','CD3E','CD4','IL2RA','FOXP3','CTLA4','CD8A','CD8B','MS4A1','CD79A', 'GNLY','NKG7','PPBP','CLEC4C','FCER1A','CD14','FCGR3A','S100A8')
### 散点图p <- FeaturePlot(sco, features = markers, ncol = 3)ggsave("seurat_markers_featureplot.png", p, width = 12, height = 18)
p <- FeaturePlot(sco, reduction = "mcsc", features = markers, ncol = 3)ggsave("metacell_markers_featureplot.png", p, width = 12, height = 18)
### 小提琴图p <- VlnPlot(sco, features = markers, group.by = "mc_id", stack = T, flip = T)ggsave("metacell_markers_vlnplot.png", p, width = 18, height = 12)
p <- VlnPlot(sco, features = markers, group.by = "seurat_clusters", stack = T, flip = T)ggsave("seurat_markers_vlnplot.png", p, width = 18, height = 12)

Seurat处理结果展示



metacell处理结果展示

p1 <- plot_mc_sc(sco, "pbmc") + NoLegend()p2 <- DimPlot(sco, reduction = "mcsc", group.by = "SingleR", label = T) + NoLegend()p3 <- DimPlot(sco, reduction = "mcsc", group.by = "seurat_clusters", label = T) + NoLegend()
p = p1|p2ggsave("metacell_celltype.png", p, width = 15, height = 6.5)p = p1|p3ggsave("metacell_cluster.png", p, width = 15, height = 6.5)
p1 <- DimPlot(sco, reduction = "umap", group.by = "SingleR", label = T) + NoLegend()p2 <- DimPlot(sco, reduction = "umap", group.by = "seurat_clusters", label = T) + NoLegend()p = p1|p2ggsave("celltype_cluster_umap.png", p, width = 15, height = 6.5)

Celltype与metacell的对应关系

seurat_cluster与metacell的对应关系

seurat_cluster与SingleR鉴定结果的对应关系



交流探讨

如果您阅读此文有所疑惑,或有不同见解,亦或其他问题,欢迎添加我的微信探讨。我工作的公司2016年开始单细胞测序服务,至今已完成近万例样本的单细胞测序,服务质量经过了10X Genomics公司的权威认证。欢迎大家联系Kinesin洽谈单细胞测序及数据分析业务!



您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存